skill rating
PandaSkill - Player Performance and Skill Rating in Esports: Application to League of Legends
De Bois, Maxime, Parmentier, Flora, Puget, Raphaël, Tanti, Matthew, Peltier, Jordan
To take the esports scene to the next level, we introduce PandaSkill, a framework for assessing player performance and skill rating. Traditional rating systems like Elo and TrueSkill often overlook individual contributions and face challenges in professional esports due to limited game data and fragmented competitive scenes. PandaSkill leverages machine learning to estimate in-game player performance from individual player statistics. Each in-game role is modeled independently, ensuring a fair comparison between them. Then, using these performance scores, PandaSkill updates the player skill ratings using the Bayesian framework OpenSkill in a free-for-all setting. In this setting, skill ratings are updated solely based on performance scores rather than game outcomes, hightlighting individual contributions. To address the challenge of isolated rating pools that hinder cross-regional comparisons, PandaSkill introduces a dual-rating system that combines players' regional ratings with a meta-rating representing each region's overall skill level. Applying PandaSkill to five years of professional League of Legends matches worldwide, we show that our method produces skill ratings that better predict game outcomes and align more closely with expert opinions compared to existing methods.
- Asia > China (0.07)
- Europe (0.05)
- South America > Brazil (0.04)
- (5 more...)
- Research Report (0.82)
- Overview (0.67)
- Leisure & Entertainment > Sports (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
A State-Space Perspective on Modelling and Inference for Online Skill Rating
Duffield, Samuel, Power, Samuel, Rimella, Lorenzo
In the quantitative analysis of competitive sports, a fundamental task is to estimate the skills of the different agents ('players') involved in a given competition based on the outcome of pairwise comparisons ('matches') between said players, often in an online setting. Skill estimation facilitates the prediction of various relevant outcomes of subsequent matches, which can then be applied towards high-level decision-making for the competition, including player seeding, fair team matching, and more. There are several established approaches to the task of skill estimation, including among others the Bradley-Terry model (Bradley and Terry, 1952), the Elo rating system (Elo, 1978), the Glicko rating system (Glickman, 1999), and TrueSkill (Herbrich et al., 2006) each with various levels of complexity and varying degrees of statistical motivation. Skill rating is of paramount importance in the world of competitive sports as it serves as a foundational tool for assessing and comparing the abilities of players and how they vary over time. By accurately quantifying skill levels, skill rating systems enable fair and balanced competition, inform strategic decision-making, and enhance the overall sporting level.
- North America > United States > Washington > King County > Seattle (0.14)
- South America > Argentina (0.04)
- Asia > Middle East > Saudi Arabia (0.04)
- (10 more...)
- Research Report (0.63)
- Workflow (0.46)
- Overview (0.46)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Leisure & Entertainment > Sports > Tennis (0.68)
- Leisure & Entertainment > Games > Chess (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Evaluating Team Skill Aggregation in Online Competitive Games
Dehpanah, Arman, Ghori, Muheeb Faizan, Gemmell, Jonathan, Mobasher, Bamshad
One of the main goals of online competitive games is increasing player engagement by ensuring fair matches. These games use rating systems for creating balanced match-ups. Rating systems leverage statistical estimation to rate players' skills and use skill ratings to predict rank before matching players. Skill ratings of individual players can be aggregated to compute the skill level of a team. While research often aims to improve the accuracy of skill estimation and fairness of match-ups, less attention has been given to how the skill level of a team is calculated from the skill level of its members. In this paper, we propose two new aggregation methods and compare them with a standard approach extensively used in the research literature. We present an exhaustive analysis of the impact of these methods on the predictive performance of rating systems. We perform our experiments using three popular rating systems, Elo, Glicko, and TrueSkill, on three real-world datasets including over 100,000 battle royale and head-to-head matches. Our evaluations show the superiority of the MAX method over the other two methods in the majority of the tested cases, implying that the overall performance of a team is best determined by the performance of its most skilled member. The results of this study highlight the necessity of devising more elaborated methods for calculating a team's performance -- methods covering different aspects of players' behavior such as skills, strategy, or goals.
The Evaluation of Rating Systems in Team-based Battle Royale Games
Dehpanah, Arman, Ghori, Muheeb Faizan, Gemmell, Jonathan, Mobasher, Bamshad
Online competitive games have become a mainstream entertainment platform. To create a fair and exciting experience, these games use rating systems to match players with similar skills. While there has been an increasing amount of research on improving the performance of these systems, less attention has been paid to how their performance is evaluated. In this paper, we explore the utility of several metrics for evaluating three popular rating systems on a real-world dataset of over 25,000 team battle royale matches. Our results suggest considerable differences in their evaluation patterns. Some metrics were highly impacted by the inclusion of new players. Many could not capture the real differences between certain groups of players. Among all metrics studied, normalized discounted cumulative gain (NDCG) demonstrated more reliable performance and more flexibility. It alleviated most of the challenges faced by the other metrics while adding the freedom to adjust the focus of the evaluations on different groups of players.
Competitive Balance in Team Sports Games
Nikolakaki, Sofia M, Dibie, Ogheneovo, Beirami, Ahmad, Peterson, Nicholas, Aghdaie, Navid, Zaman, Kazi
Competition is a primary driver of player satisfaction and engagement in multiplayer online games. Traditional matchmaking systems aim at creating matches involving teams of similar aggregated individual skill levels, such as Elo score or TrueSkill. However, team dynamics cannot be solely captured using such linear predictors. Recently, it has been shown that nonlinear predictors that target to learn probability of winning as a function of player and team features significantly outperforms these linear skill-based methods. In this paper, we show that using final score difference provides yet a better prediction metric for competitive balance. We also show that a linear model trained on a carefully selected set of team and individual features achieves almost the performance of the more powerful neural network model while offering two orders of magnitude inference speed improvement. This shows significant promise for implementation in online matchmaking systems.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California > San Mateo County > Redwood City (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Leisure & Entertainment > Sports (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
Skill Rating for Generative Models
Olsson, Catherine, Bhupatiraju, Surya, Brown, Tom, Odena, Augustus, Goodfellow, Ian
We explore a new way to evaluate generative models using insights from evaluation of competitive games between human players. We show experimentally that tournaments between generators and discriminators provide an effective way to evaluate generative models. We introduce two methods for summarizing tournament outcomes: tournament win rate and skill rating. Evaluations are useful in different contexts, including monitoring the progress of a single model as it learns during the training process, and comparing the capabilities of two different fully trained models. We show that a tournament consisting of a single model playing against past and future versions of itself produces a useful measure of training progress. A tournament containing multiple separate models (using different seeds, hyperparameters, and architectures) provides a useful relative comparison between different trained GANs. Tournament-based rating methods are conceptually distinct from numerous previous categories of approaches to evaluation of generative models, and have complementary advantages and disadvantages.
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Quebec > Montreal (0.04)